-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add extractor configurations for Amharic Language #759
Add extractor configurations for Amharic Language #759
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great 👍 , except for a minor comment on the date time parser
"ሰኔ" -> 10, | ||
"ሐምሌ" -> 11, | ||
"ነሐሴ" -> 12, | ||
"ጳጉሜ" -> 13 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 13
would be an invalid month value here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in addition, google translate provides this as a translation to the above
"September" -> 1,
"October" -> 2,
"November" -> 3,
"December" -> 4,
"January" -> 5,
"February" -> 6,
"March" -> 7,
"April" -> 8,
"May" -> 9,
"June" -> 10,
"July" -> 11,
"August" -> 12,
"Pagume" -> 13
not sure how accurate this translation is or if there is a calendar difference but, in the general case January
should be mapped to 1
not September
(and accordingly the other values)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Ethiopian calendar system has 13 months. we use a different calendar system :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I thought so... :)
the thing is that this needs to map to the Gregorian calendar. Values with month 13
will fail to get parsed and we will loose that triple, maybe map that to 12
or skip that line? Not sure what would result in fewer errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Among other things...
google translate provides this as a translation
It would seem that this ought to result in a bug report (from/by a properly knowledgeable reporter such as @Meti-Adane) against Google Translate, as the 13 Amharic month names (with twelve lengths being 30 days and one length being 5 or 6 days; and with New Year's Day on September 11 or 12, depending on leap years) obviously cannot be directly translated to the 12 Gregorian month names (with lengths of 28, 29, 30, or 31 days; and with New Year's Day always on January 1).
As to the mapping here, the handling of Hebrew months (closely tied to the lunar cycle; twelve months of 29 or 30 days, plus a thirteenth leap month every few years) might provide some valuable hints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jimkont — "this needs to map to the Gregorian calendar"... but why? It seems to me that this PR reveals a significant flaw in a number of systems. There ought to be a way to express a date in any calendar, which some systems might offer to translate to one or more other calendars, including Gregorian, Amharic, Hebrew, etc. Is translation among some of these easy? Lots of systems will support them. Is translation among some of these hard? Fewer systems will support them, or there will be a few libraries produced to handle such translations.
Net of all this -- I think there ought to be an issue about non-Gregorian calandar ingestion and preservation. This is not too different from the (ongoing) efforts to losslessly handle data using multiple geocoordinate systems for Earth plus other celestial bodies (Moon, Mars, etc.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the prompt @TallTed , I created an issue here: #761
@Meti-Adane we could still merge this PR and possibly only remove the date mappings until the issue is resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jimkont Well noted. I have implemented a date converter for Ethiopian calendar (Ethiopian to Gregorian). I will push the change list in a separate PR to make the review easier.
No description provided.